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INTRODUCTION 


As  a  result  of  the  availability  of  electronic  computers  many  more 
relationships  between  dependent  and  independent  variables  are  being 
examined.    A  specific  problem  brought  to  the  attention  of  the  author  was 
the  estimation  of  the  coefficients  0^,  9^,  0^,  and  0^  of  the  equation 

y  »  e^e     ^    ^  Q^e  9^  <  ^ 

i  -  1,  ...  ,4 

This  equation  has  been  used  to  describe  the  amount,  y,  of  a  drug  in  the 
blood  of  a  human  being  after  time  x  has  elapsed  (Wagner  (1964)}. 

With  this  problem  in  mind  the  development  of  least  squares  is  dis- 
cussed.   Several  methods  for  obtaining  least  squares  estimates,  with 
emphasis  on  the  non- linear  case  are  summarised.    Some  discussion  concern- 
ing the  statistical  properties  of  least  squares  estisiates  is  also  presented 
in  this  paper. 

HISTORICAL  DEVELOPMENT 

The  theory  of  least  squares  has  developed  over  the  past  243  years  and 
the  method  of  least  squares  as  a  method  of  estimating  coefficients  has 
been  in  use  for  over  160  years.    Merriman  (1877)  lists  408  titles  of  papers 
between  1722  and  1876  which  relate  to  least  squares.    Todhunter  (1865)  is 
another  work  which  considers  the  early  development  of  least  squares. 

The  method  of  least  squares  as  a  method  of  estimating  coefficients 
was  first  stated  by  Legendre  (1806).    This  formulation  is  described  below. 

Observations  which  are  assumed  to  be  functions  of  other  variables  and 
eoeffldents  frequently  can  be  assumed  to  be  related  by  a  set  of  linear 


•quations : 

•ru    •  •  •    Vik  ^1 

The  e^,  i  *  1,  .  .  .  ,k,  are  tha  unknown  coafficlents  to  be  estimated. 
The  x^j,  i  «  1  s,  J  "  1,  .  .  .  ,k,  are  known  quantities,  and  the 

y^^*  i  ^  1  «»  are  observations  which  are  subject  to  error  and 

estioate  the  txue  value  of  the  function  Y^,  say.    Legendre  called  this 
set  of  linear  equations »  "the  equations  of  condition."    Usually  s  k 
and  the  equations  are  independent. 

It  is  desired  to  find  values  of  the  coefficients  8^^  8^  such 

that  these  equations  are  satisfied  as  nearly  as  possible.    To  "satisfy  as 
nearly  as  possible"  means  to  minimize  the  errors,  or  possibly  a  function 
of  the  errors,  e^,  .  .  .  ,e^,  where 

e,  -  8^x,^  -  .  .  .  ^  8j^x^j^  .  Y,,  1  -  I  s. 

That  is,  the  differences  between  the  observed  values  and  the  functional 

values  or  some  function  of  these  differences  is  to  be  minimized. 

2  2 

Legendre  proposed  that       ^  •  •  •  ^  *,*  of  the  squared  errors, 

be  minimized.    A  solution  8^,  .  .  .  ,8|^  is  obtained  by  setting 

•i  \ 

Ihase  resulting  equations  Legendre  denoted  as  the  normal  equations. 

JUSTIFICATION  OF  LEAST  SQUARES 
Gauss  first  attempted  to  Justify  the  use  of  the  method  of  least  squares 


1 


by  demonstrating  that  the  errors  of  observation  for  a  quantity  to  be 

estimated  are  distributed  normally  and  that,  if  this  is  true,  the  roost 

probable  estimate  is  the  estimate  obtained  by  the  method  of  least  squares. 

This  Justification  by  Gauss  is  somewhat  intuitive.    It  is  presented,  how 

ever,  because  it  is  one  of  the  first  attempts  to  justify  the  use  of  the 

method  of  least  squares. 

This  early  justification  assumes  the  postulate  of  the  arithmetic  mean 

(Whittaker  and  Robinson  (192A)):  "When  any  number  of  equally  good  direct 

observations  M,  M» ,  M« • ,  .  .  .  of  an  unknown  magnitude  x  are  given,  the 

most  probable  value  for  x  is  their  arithmetic  mean.'*    Gauss  deduced  this 

postulate  from  four  elementary  axions. 

Axiom  I  -  The  differences  between  the  most  probable  value 
and  the  individual  measures  do  not  depend  on  the  position  of  the 
null  point  from  which  they  are  reckoned. 

Axiom  II  -  The  ratio  of  the  most  probable  value  to  any 
individual  measure  does  not  depend  on  the  unit  in  terms  of  which 

measures  are  reckoned. 

Axiom  1X1  -  The  most  probable  value  is  independent  of  the 
order  in  which  the  measurements  are  made,  and  so  is  a  symmetric 
function  of  the  measures. 

Axiom  IV  -  The  most  probable  value,  regarded  as  a  function 
of  the  individual  measures,  has  one-> valued  and  continuous  first 
derivatives  with  respect  to  them. 

The  first  two  axioms  propose  the  most  probable  value  to  be  invariant 
under  a  linear  transformation.    The  third  axioa  means  the  observations  are 
a  random  sample.    The  fourth  axlon  explains  Itself. 

Suppose  on  some  small  interval  between  A  and  A     dA  the  probability 
of  error  is  0(A)dA.    Then,  0(A)  is  the  relative  frequency  of  error.  Let 
denote  the  least  possible  error,  for  any  one  measurement.    The  case  is 
being  considered  where  dA  >  c  and  the  probability  of  error  associated  with 
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any  A  Is  0(A)  €  . 

If  M,  M»,  M",  .  .  .  ,M^*^  are  s  observations  which  are  used  to 
calculate  a  quantity  x  whose  true  value  is       the  errors  o£  observation 
are 

A  »  M  -  p 
A«  -  M»  -  p 

A"  =»       -  p. 

The  probability  of  a  certain  sequence  M,  M«,  M",  .  .  .  ,M^*^  occurring, 
€*i(M  -  p)0(M«  -  p)  .  .  .  ,»(M^®'  -  p).    To  illustrate,  consider  the 
problem  of  estimating  the  length  of  a  desk  with  true  length  p.    The  accuracy 
of  the  yardstick  being  used  is  i".    Therefore  e  »  k"  if  no  huoan  error.  The 
s  measurements  M,  M' ,     ' ,  .  .  .  are  measurements  of  a  quantity  x,  which 
estimates  p,  such  that  E(x)  "  p.    The  probability  of  a  certain  sequence 
M,  ,  .  .  is  i;*0(M  .  p)  '  0(Mi  -  p)  .  .  .  where  0(M  •  p)k  is  the 

probability  of  an  error  of  sise  M  -  p. 

An  expression  for  the  probability  of  the  true  value  of  x  lying  between 

p  and  p  +  dp  can  be  obtained  by  considering  Bayes*  Theorem, 

P(BIAj^)P(A^) 
"  iS(BIAj)P(Aj)  • 

For  this  problem  A^  is  the  event  x  for  the  given  sequence  being  between 
p  and  p  >  dp.    The  event  B  can  be  considered  as  the  particular  sequence  of 
observations.    In  words,  the  probability  that  the  true  x  is  between  p  and 
p  -t"  dp  for  a  given  sequence  of  observations  is  equal  to  the  probability  of 
a  certain  sequence  given  that  x  is  between  p  and  p     dp  multiplied  by  the 
probability  x  is  between  p  and  p  *  dp,  all  divided  by  the  sua  of  the  above 
product  for  all  possible  values  of  x. 
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The  P(BIA^)  Is    e"0(M  -  p)0(M'  -  p)  ...  and  P(A^)  is  dp.    The  right 
hand  side  of  the  expression  above  could  be  written  in  the  notation  pre- 
viously introduced  as 

^"gm  -  p)e(M'  -  D)  .  .  .  dn  ^ 
j    9m  -  p)0(i>i«  -  p)  ...  dp  * 

To  Baximize  the  probability  that  x  is  between  p  and  p  +  dp  the  value  of  x 
which  naximized  0(M  -  x)0(M«  -  x)  .  .  .  should  be  adopted.    The  function 
•(M  -  x)0(M»  -  x)  .  .  .  being  positive,  the  logarithm  of  the  function  will 
occur  at  the  saae  point  as  for  the  first.    Therefore  that  value  of  x  for 
which 

Sdlog  0(M  -  x)/dx  -  0 
is  desired.    But,  assuming  the  postulate  of  the  arithmetic  mean,  the  most 
probable  value  for  x  is  the  aritfaswtic  mean.    Then  x  »  (l/s)(M  +  M«  + 
.  .  .  ),  and  2(M  -  x)  »  0.  Therefore 

2d/dx  log  0(M  •  X)  »  iKM  -  X)  «  0 
and  d/dx  log  0(M  -  x)  »  c(M  -  x)  for  c  a  constant.  Performing 

the  integration,  log  0(M  -  x)  -  -c(M  -  x)^/2  +  b  and»  therefore, 
0(M  -  X)  -  Ae"^*"^^  "  *\  where  A  a  constant. 

Since  the  sum  of  the  probabilities  of  all  the  errors  is  unity,  A  can 
be  found  as    JeTZn   .    Writing  h  «    ^/c72    one  has 

which  means  the  distribution  of  measurements  about  the  true  value  is  a 
normal  frequency  distribution.    The  probability  of  a  certain  set  of  errors, 
therefore,  is 


The  oost  probabl*  value  of  x  is  that  which  maximises  the  above 
expression  if  p  is  replaced  by  x»  or  it  is  that  value  of  x  which  minimises 

hJ(M  -  x)^  +  .  .  .  +  h^(M          -  x)^,  a  sum  of  squared  deviations.  Then 


h^M« • • • 


(hj  *  .  .  .  *  h^) 


2 

Suppose  w^  =•  hj»  i  =»  1,  .  .  .  »s;  then, 

(w.M  +  .  .  .  +  w  rt  ) 

V  a  I     T         I    I,  I       ■  I  * 


is  the  least  squares  solution. 

The  quantities  w^^*  .  .  . 
tions,  M,  M',  M**,  .  .  .  and  frequently  the  sum  of  the  w^  is  set  equal  to 
W. 

Gauss  again  justified  the  use  of  the  method  of  least  squares  with  a 
less  heuristic  argument  in  1821.    This  justification  has  been  sunuoarized  by 
PLackett  (1949),  who  emphasizes  that  this  justification  is  very  different 
from  the  justificatlcm  of  Laplace.    Plackett  used  matrix  notation  to 
sunnariae  these  results  as  defined  below. 

Let  0(s  X  1)  be  a  vector  of  unknown  coefficients,  y  (n  x  1)  a  vector 
of  observations,  e  (n  x  1}  a  vector  of  errors  and  X  (n  x  s)  a  matrix  of 
known  quantities.    A  set  of  equations  relating  these  quantities  can  be 
expressed  as  Xft  »  y  +        Further,  assume  that  W  (n  x  n)  is  a  diagonal 
Mtrix  whose  elsnents  are  the  reciprocals  of  the  variances  of  the  e  vector. 
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The  method  of  least  squares  leads  to  estimates  of  the  coefficients  which 
satisfy  the  relationship  X«WXe*  »  X'Wy.    That  is  9*  =  (X'WX)"^X« Wy,  where 
0*  is  the  vector  which  estimates  0. 

Gauss's  1821  justification  was  written  in  Latin  and  a  French  transla* 
tion  was  published  by  Bertrand  in  1855.    The  following  represents  this 
Justification  as  summarized  by  Plackett  (1949). 

From  the  above  statement  0*  =  (X'WX)"^X«Wy  let  W     I,  the  identity 
matrix.    If  ft*  »  By  is  unbiased  and  E(e)  =  0  for  all  0  then  E(e*)  =  E(By)  » 
BE(y)  ■  B£(XO  +  e)  »  BXE(e)  »  BX©  =  0.    Therefore  BX  =  1.    Let  S  »  X'X. 
It  follows  that  S"^  »  BXS"^  and  BB'  »  (s'^X* )(s"^X« )•  +  (B  -  S"^X») 
(B  -  S*^X')»  ,  and  the  diagonal  elements  of  BB»  are  least  when  B  =  s"^X«, 
which  is  the  least  squares  solution. 

This  important  theorem  was  discussed  by  Markoff  in  1912  and  the 
theoran  was  extended  by  Aitken  in  1934  to  consider  not  only  the  diagonal 
elements  as  in  W,  but  the  entire  variance- covariance  matrix. 

Laplace  in  1811  established  the  method  of  least  squares  in  a  different 
manner.    Plackett  (1949)  summarized  this  and  the  work  of  Laplace  from 
1812-1820  concerning  least  squares  in  the  following  way. 

Among  all  s  X  n  matrices  F  leading  to  estimates  of  the  form 
FXe*  «  Fy,  the  expected  values  of  the  elements  18-  0*  I  are  minimized  as 
n-*^'    when  F  =  @X«W,  @  being  an  arbitrary  multiplier. 

In  other  words  the  solution  of  the  equation  @X«WXe*  »  QX'Wy  for  the 
estimate  0*  will  be  an  unbiased  estimate  of  0.    As  noted  earlier  the  above 
estimate  for  9  is  the  least  squares  estimate. 


PROPERTIES  OF  ESTIMATES  IN  THE  LINEAR  MODEL 

If  ona  considers  tha  general  linear  hypothesis  a  model  of  full  rank, 
that  is,  the  model  X9  -  y  -  e,  where  the  matrix  X  is  of  full  rank,  then 
the  method  of  least  squares  gives  rise,  with  two  additional  assumptions, 
to  estimates  whose  statistical  properties  are  given  in  the  Gauss-Markoff 
theorem.    Because  of  the  importance  of  this  theorem,  both  the  theorem  and 
a  proof  are  given  below  modified  after  Graybill  (1961). 

If  the  general- linear* hypothesis  model  of  full  rank  y  «  XO  e  is 
such  that  the  following  two  conditions  on  the  random  vector  e  are  met: 

1)  E(e)  "  0 

2)  E(ee«)  »  crh  , 

the  best  (minimum-variance)  linear  (linear  function  of  the  elements  of  y) 
unbiased  estimate  of  0  is  given  by  least  squares. 

The  proof  is  as  follows.    Let  B  be  any  s  x  n  constant  matrix  and  let 
9*  »  By.    Suppose  B  »  S*^X'  +  A.    s"^X'  is  known,  but  A  oust  be  found  in 
order  to  specify  B.    For  unbiasedness,  E(e*)  »  8.    Therefore  E(0*)  » 
eC(S"^X«  +  A)yj  -  E(S"^X«y)  +  E(Ay)  »  s'^X'XO  +  E(Ay)  »  8  +  AXe.    Thus  to 
be  unbiased,  AX  "  0. 

For  the  property  of  minimum-variance  the  matrix  A  must  be  found  that 
minimizes  the  variance  of  8*  subject  to  the  restriction  AX  «  0. 

Consider  the  covarlance  of  8*, 

cov(8*)  »  e[(8*  -  8)(e*  -  8)'] 

»  E  [(S"^X«  +  A)y  -  8][(S"^X«  +  A)y  -  8] • 
Substituting  X8     e  for  y  and  recalling  that  AX  »  0, 
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cov  (8*)  »  E(3*^X«ee'Xs"^)  +  Aee'A'  +  s'^X'ee'A*  +  Bee'XS"^ 
■  cr^(s*^  +  AA« ) 

Let  AA'  »  G  »  (gj^j).    To  minimize  the  var  (9*),  the  diagonal  elements  of 

2-1  2  1 

o-  (S     +  AA«)  oust  be  minimized.    Since  cr    and  s"    are  constants,  the 

diagonal  elements  of  G  must  be  minimized.    But  G  is  a  positive  senidefinite 
matrix  and  hence  g^^  ^  0.    The  diagonal  elements  obtain  a  minimum  when 
g^jl^  »  0.    This  implies  that  all  the  elements  of  A  »  0  and  A  »  0.  There- 
fore B  »  S"^X  and  8*  »  s'^Xy  are  the  least  squares  estimate  for  8. 

Aitken  (1948)  generalized  this  theorem  to  consider  the  situation 
where  E(ee' )  "  cr^,  v  being  in  a  completely  general  matrix.    A  discussion 
of  this  topic  can  be  found  in  Kendall  and  Stuart  (1961). 

Plackett  (1950)  considers  the  case  when  X  is  of  less  than  full  rank. 
The  minimum- variance,  unbiased  solution  for  the  estimate  of  0  may  be 
obtained  by  modifying  the  method  of  least  squares.    Plackett 's  1930  paper 
also  includes  a  procedure,  which  requires  a  minimum  of  calculation,  for 
estimating  the  coefficients  and  sums  of  squares  when  additional  observations 
occur. 

If  the  additional  assumpticm  that  the  vector  of  errors  is  normally 
distributed  can  be  made,  the  least  squares  estimates  and  the  estimates 
obtained  by  the  method  of  maximum  likelihood  are  identical  for  the  general 
linear  hypothesis  of  full  rank.    These  estimates  have  the  statistical 
properties  of  being  consistent,  efficient,  unbiased,  sufficient,  complete, 
and  have  minimum  variance  (Graybill  (1961)). 


FIRST  ORDER  TAYLOR  APPROXIMATION  OF  MON-LIMEAR  FUNCTIONS 


Suppose  it  is  required  to  find  estimates  of  the  coefficients  9^^  and 
©2  fron  a  set  of  equations  f^Ce^*        "  ^i     ^     1»  •  •  •  »8  where 
f^(Oj^t  known  functions  not  necessarily  linear  in  0^  and  6^  and 

^1*  *  *  *  observations  subject  to  error. 

Whictaker  and  Robinson  (1924)  first  suggested  an  approxiraation  by  a 
first  order  Taylor  series, 

where  9*^  and  9*^  are  initial  approximations  for       and  ^ 
^1  "  ®1  *  *10»  ^2  "  *2  "  *20'    ^®  resulting  series  of  equations  can  be 
solved  for  values  of       and  A^  by  the  method  of  least  squares.    The  initial 
approximations  can  be  corrected  by  these  values  and  used  as  new  approxiaa« 
tions  for  the  coefficients.    The  equations  then  can  be  solved  for  new 
values  of  A^  and  A^.    The  iteration  continues  until  A^  and  A^  equal  sero. 
Convergence  to  solutions  is  not  guaranteed.    It  has  been  assuised  that  the 
first  order  approximation  is  an  appropriate  approximation  for  the  original 
equation . 

ESTIMATION  UTILIZING  THE  JAOOBIAN 

Vynn  (1962)  considers  a  special  non- linear  function  and  gives  an 
organized  treatment  of  this  function  applying  the  first  order  Taylor 
approximation.    If  the  coefficients  to  be  estimated  occur  in  the  relationship 
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and  on«  lets 


and  S  "  ,  then  S  is  to  be  a  minimum. 

By  taking  partial  derivatives  of  S  with  respect  to  the  coe££ici«nta 

one  can  derive  the  2h  +  1  equations  « 


•o  "  ^  -     ^1  "l 


'2J       bSj^      ''•ij  ^i  * 


Vi  j  °  ^  ^ 


Lmt  the  sequence  of  estimates  be  related  by  th«  aquations 

e.tr  *  1)  ,  ^,ir)    ^       (r)      (r  +  1)  ,    (r)  ^  (r) 

•ij  *       "  *  ^^2^    *^  ^""^  notation 


u  u 


(r) 


,(r) 
'2h 


u  =  0, 


>2h 


be  adopted. 
i(r) 


0^     is  one  of  the  2h  +  1  partial  derivatives  above  evaluated  at  the 
r'^  estimate  of  the  coefficients.    The  notation  for  0^'^  is  adopted  to 
convey  this  meaning. 


Neglecting  second  and  higher  oxder  terms  one  obtains 


(»0  »  «1 


•  '  *2h  ^ 


The  dimensions  of  this  operation  ar« 

[(2h  +  1)  .  (1)]  »  [(2h  +  1)  •  (I)]  +  [{2h  +  1)  .  {2h  +  l)][(2h  +  1)  .  (I)]. 

The  vectors  in  the  equation  are  column  vectors  and  J  is  the  Jacobian  which 
can  be  expressed  as 


11^ 


11 


11 


2h 


^9- 


and  J^'^^  implies  the  r*^**  set  of  vectors  in  the  Jacobian. 


If  the  set  0^'^  is  computed  by  the  derived  equations  and  the  value* 
Aes  AoJj\  Ae^J^  are  rmiuixwi  which  aaka  i^'  *      equal  to  zero,  then 
from  the  above  equation 


(Ae.^'^\  AeJJ>  A8<J>)i  (.j<'>)-i(0<'^>, 


"'2h  ' 


Improved  values  of  0' ,  0^^,  e^j        obtained  by  iteration. 


An  alternate  procedure  which  demands  less  computation  is  to  compute 
J^°^  and  invert  this  matrix.  The  iteration  then  proceeds  in  conjunction 
vith 

(AQ.^'^^  .  A9^J^  -  (J<°^-^(0^^^  

Although  less  computation  is  necessary  as  far  as  inverting  the  Jacobian 
with  each  iteration,  this  latter  technique  will  converge  much  more  slowly 
than  if  the  Jacobian  is  calculated  and  inverted  each  time. 

WEIGBTING  AND  HIGHER  ORDER  TAYLOR  APPROXIMATKWS 

It  has  been  observed  that  in  some  problems  the  first  order  Taylor 
approximation  fails  to  improve  the  initial  solution  (Wilson  and  Aiffer 
(1933)).    Some  alternatives  to  the  first  order  approximation  are  considered. 

ax 

Consider  the  non> linear  equation  Y^  "  O^e     ,  where       and  0^  are  the 
coefficients  to  be  estimated.    If  the  method  of  least  squares  is  used,  the 
^(Yjl^  *  ^i^    is  to  be  minimized  where  the  y^  are  observations  which  estimate 
Y^.    The  problem  is  that  9^  enters  linearly  but  8^  enters  non> linearly.  If 
one  utilizes  logarithms  and  minimizes  ^(*2*i  *j  -  log  y^^^*  ®2 

linearly  and  0^  enters  non- linearly.  Although  the  adoption  of  logarithms 
for  this  type  of  problem  is  not  uncommon,  it  does  not  necessarily  provide 
for  a  better  solution. 

If  one  can  assume  that  the  error  of  observation  is  relatively  small 
the  difference  y^  -  Y^^  could  be  considered  as  a  differential  of  Y.  Since 

dlog  Y  »  dY/Y,  Uy^  '  Y^)^  is  approximated  by  either  Zilog  y^  -  log  \)^yl 

or  i:(log  y^  -  log  if  y^  and  Y^  are  nearly  equal.    Of  the  two  forms 


Wilson  and  Puffer  (1933)  point  out  that  the  second  form  is  quadratic  in 
the  unknown  coefficients  and  also  leads  to  linear  normal  equations. 

A  third  Method  of  solving  non- linear  equations  is  to  expand  the 
function  by  a  Taylor  series  of  order  greater  than  one. 

••DAMPING"  THE  NORMAL  EQUATIONS 

Levenberg  (1944)  aodified  the  method  of  estimating  eoefficients 
utilizing  the  first  order  Taylor  approximation. 

Let  g^(0)  "  f^(d*)  -  f^(0).    The  least  squares  criterion  requires 

that  S(0)  >  ilg^(e)  be  minimized.    The  function 

g,(e)-c,(«)-g^(^e,  . 

Using  this  approximating  function  G^(0)»  it  is  now  required  to  minimise 

8(0)  «  2G^{9)t  where       is  an  initial  approximaticm  vector. 
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By  tiding  partial  derivatives  of  ilG^(O)  with  respect  to  the  various 
coefficients  and  setting  the  resulting  expressions  equal  to  zero  the 
following  equations  may  be  obtained. 

(ac  )^  (^G.7)C)  ^G. 

^-^^"^^e7T«7^2^  •  •  •  *^T5j«i'° 

(^G.»~?)G  )  (f)G.)^  ^G 

^-1157T57^*^-T97-^2*  •  •  •  *^Ti;«i-° 


Lcvenberg  points  out  that  the      may  be  so  large  that  successive  approxiiBa- 
tions  for  0*  will  yield  a  ^^(0*)  which  is  larger  than  the  initial 
solution. 

Let  S(e*)  «  WS(e*)  +  w^aJ  +  w^A^  +  .  .  .  ,  where  Wj^,         .  .  .  are 
the  weights  o£  the  A^  and  W  expresses  the  relative  iiiq>ortance  of  the 
residuals  and  the  A^.    Suppose  the  function  S(0*)  takes  its  rainimuai  at 
e+  »  *2+'  •  •  •  )  and  set  qi^Q)  "  w^A^  +  w^^  +  •  •  ••    Under  the 

assumption  S(6)  is  not  stationary  at  8  "  0^,  Levenberg  obtained  the 
following  inequalities.    3(9+)  <  8(^0)  and  Q(e+)  <  Q(e„  ),  where  Q(9^  ) 
denotes  the  standard  least  squares  solution. 

The  normal  equations  which  result  from  minimising  the  expression  for 
S(9*)  above  become 

OG.^G.)  (^G.)^  .  ^G, 


The  best  value  of  W  may  theoretically  be  determined  by  solving 
dS(0*)/dW  "  0.    S(e*)  may  be  approximated  by  a  Taylor  expansion  gives 
S(8*)  a  8(^0)  +  W(dS/dW}  |  ^  ,  q*    And  setting  this  expression  equal  to 
sero  on  the  assumption  that  ^0  was  chosen  so  that  the  decreased  value 
8(0*}  is  small,  the  result 

8(^9)  %S(q9) 

"  '  "^^W  -  0  '  )V/  .  U  )V,^  +  ...  .  ) 

was  obtained  by  Levenberg.    If  necessary*  the  value  for  W  may  be  improved 


by  calculating  SC^O)  for  several  values  of  ^0  eo  that  an  approximate 

mininum  may  be  located. 

Concerning  the  weights  w^,  w^i  ...»  a  system  which  has  bean 

successfully  used  is         w^  °  .  .  .  .    Another  system  which  has  also 

OG  )^  O.G  )^ 

proven  useful  is  Wj^  =  S  ,      "  2  -rrj-    ,  .  . 

THE  MODIFIED  GAUSS- NEWTON  METHOD 

Hartley  (1961)  named  the  method  previously  described  as  the  Gauss- 
Newton  method  and  proposed  another  modification  to  the  method.    If  the 
assumptions,  (a)  the  non- linear  function  is  continuous  and  first  and  second 
derivatives  exist,  (b)  the  observed  X  matrix  is  of  full  rank,  and  (c)  if 

Q     lim  inf  Q(x;9)  where  S  is  the  compliment  of  S  which  is  a  bounded 
S 

convex  set  of  the  coefficient  space  6^  0^,  it  is  possible  to  find 

a  vector  ^6  in  the  interior  of  S  such  that  Q(x;q0)  <  Q,  are  satisfied. 
Hartley  (1961)  proved  it  is  possible  to  describe  an  iterative  process  which 
will  always  converge  to  the  least  squares  solution. 

Under  the  above  assumptions  Hartley  (1961)  proposed  to  start  with  the 
usual  normal  equations  obtained  by  utilising  a  first  order  Taylor  series 
approximation.    However,  instead  of  letting  the  second  approximati<m  for 
e  by       +  A  consider  Q(v)  =  qUi^O  +  vA)     0  5  v  {  1  and  denote  by  v»  the 
value  of  V  for  which  Q(v)  is  a  minimum  on  the  interval  0       ^1.  Then 
V'  may  be  found  approximately  if  one  evaluates  Q(v)  for  v  »  0,  v  =  %,  and 
V  =  1  and  determines  the  level  of  v  for  which  the  parabola  through  which 
Q(0),  Q(%),  Q(l)  attains  its  minimum  from 
▼  »  %  +  4f(Q(0)  -  Q(1))/(Q(1)  -  2Q(%)  +  0(0)). 


The  above  procedure  is  Illustrated  by  Hartley  with  an  eicainple. 

The  authors  experience  indicates  that  the  direct  applicaticm  of 
Hartley's  modification  does  not  in  all  cases  lead  to  a  solution  in  a 
reasonable  amount  of  time  on  the  International  Business  Machines  1620 
computer.  Vf'-'. 

Further  modifications  to  the  method  of  Hartl^  are  discussed  in  a 
•aster's  report  by  Pence  (1963).    Pence  mentions  that  Mr.  Carlton  Hassell 
and  Dr.  Dale  Cooper,  both  of  the  Mathematical  Research  Section  of  Continen- 
tal Oil  Company  have  worked  for  some  time  in  this  area  and  plan  to  publish 
their  results  soon.    Their  computer  program,  NCMLN,  was  written  for  th« 
International  Business  Machines  7090  computer.    Some  of  the  additional 
provisions  of  the  modified  Gauss-Newton  method  discussed  by  Pence  include 
elimination  of  the  need  for  having  a  starting  value  for  any  of  the  linear 
coefficients.    Also,  graphing  the  results  of  the  modified  Gauss-Mswton 
method  and  then  by  inspection  estimating  new  values  for  the  coefficients 
and  calculating  the  new  sum  of  squares,  and  adjusting  the  non- linear 
coefficient  without  having  to  estimate  any  linear  coefficients  ar« 
considered. 

GENERAL  PROPERTIES  OF  LEAST  SQUARES  ESTIMATES 

Least  s<iuares  estimates  for  the  coefficients  of  any  function  have 
been  shown  to  be  sufficient  estimates  by  Barnard  (1963).    In  general, 
however,  the  estimates  obtained  by  the  method  of  least  s<piares  in  the  non- 
linear problem  have  no  other  general  optimum  properties  (Kendall  and 
Stuart  (1961)). 


It 
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The  purpoaes  of  this  report  are  to  examine  the  early  important 
developments  of  the  method  of  least  squares,  to  show  the  statistical 
prq^erties  of  the  estimates  obtained  by  the  method  of  least  squares,  and 
to  present  the  techniques  used  in  applying  the  method  of  least  squares. 
Special  consideration  is  given  to  the  non* linear  problem. 

Legendre  was  the  first  to  state  the  method  of  least  squares.  Gausa 
■ade  one  of  the  first  attempts  to  put  the  method  of  least  squares  on  a 
logical  foundation.    Laplace  later  added  another  justification. 

There  exists  some  disagreement  among  authors  concerning  the  results 
implied  by  these  early  papers.    R.  L.  Plackett  has  used  matrix  notation 
to  clarify  and  summarise  many  of  the  early  results  in  a  much  referred  to 
paper  published  in  1949. 

Most  of  the  developers  of  least  squares  focused  on  the  problem  of 
minimisation  of  the  sum  of  squares  of  a  linear  function  of  knotm  quantities 
and  unknown  coefficients.    The  estimates  for  this  linear  problem  have 
desirable  statistical  properties. 

Today,  however,  with  the  availability  of  high-speed  electronic 
computers  and  the  demand  for  more  sophisticated  mathematical  models  In 
the  investigation  of  new  problems,  non- linear  models  are  being  considered 
and  thoroughly  investigated. 

One  approach  to  the  non- linear  problem  is  the  approximation  of  a  non- 
linear function  by  applying  Taylor's  series.    Another  method  Is  the  use  of 
logarithms.    Still  another  technique  is  to  weight  the  observations  or 
functions  of  the  observations  in  some  manner.    A  weighting  was  developed 
by  K.  Levenberg  of  Frankford  Arsenal  in  1944.    This  procedure  has  been 
called  the  Gauss-Newton  method  by  H.  0.  Hartley  of  Texas  A.  &  M.  who 
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nodlfisd  it  and  eallsd  it  tha  Modified  Gauss- Newton  nathod.    Tha  authors 
exparience  and  that  of  othars  shows  that  this  Modified  Gauss- Newton  method 
will  not  always  yield  a  solution  within  a  reasonable  length  of  time.  There- 
fore further  oodifications  are  needed.    This  problem  has  been  more  recently 
studied  by  Dr.  Dale  Cooper  of  Continental  Oil  Company. 


